Overview

Dataset statistics

Number of variables25
Number of observations19926
Missing cells284731
Missing cells (%)57.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 MiB
Average record size in memory200.0 B

Variable types

Text13
Categorical5
Numeric3
Boolean4

Alerts

original_amount is highly overall correlated with original_currency1 and 6 other fieldsHigh correlation
transaction_month is highly overall correlated with transaction_year and 1 other fieldsHigh correlation
transaction_year is highly overall correlated with transaction_month and 1 other fieldsHigh correlation
original_currency1 is highly overall correlated with original_amount and 5 other fieldsHigh correlation
trx_currency is highly overall correlated with transaction_month and 8 other fieldsHigh correlation
division is highly overall correlated with original_amount and 2 other fieldsHigh correlation
transaction_day_of_week is highly overall correlated with original_amount and 3 other fieldsHigh correlation
weekday_transaction is highly overall correlated with original_amount and 2 other fieldsHigh correlation
transaction_to_original_diff is highly overall correlated with original_amount and 3 other fieldsHigh correlation
currency_change is highly overall correlated with original_amount and 3 other fieldsHigh correlation
transaction_freq_gt_weekly is highly overall correlated with original_amount and 1 other fieldsHigh correlation
original_currency1 is highly imbalanced (93.2%)Imbalance
trx_currency is highly imbalanced (99.8%)Imbalance
division is highly imbalanced (53.4%)Imbalance
weekday_transaction is highly imbalanced (77.1%)Imbalance
transaction_to_original_diff is highly imbalanced (84.7%)Imbalance
currency_change is highly imbalanced (84.7%)Imbalance
transaction_freq_gt_weekly is highly imbalanced (99.8%)Imbalance
purpose has 12052 (60.5%) missing valuesMissing
merchant_name has 11854 (59.5%) missing valuesMissing
cost_center_wbls_element_order_description has 11865 (59.5%) missing valuesMissing
card_posting_date has 11854 (59.5%) missing valuesMissing
merchant_type_mcc has 11854 (59.5%) missing valuesMissing
merchant_type_description has 11854 (59.5%) missing valuesMissing
original_currency1 has 11855 (59.5%) missing valuesMissing
cost_center_wbls_element_order has 11857 (59.5%) missing valuesMissing
transaction_date has 11855 (59.5%) missing valuesMissing
transaction_amount has 11855 (59.5%) missing valuesMissing
trx_currency has 11855 (59.5%) missing valuesMissing
gl_account_description has 11855 (59.5%) missing valuesMissing
original_amount has 11855 (59.5%) missing valuesMissing
division has 11855 (59.5%) missing valuesMissing
gl_account has 11855 (59.5%) missing valuesMissing
batch_transaction_id has 11855 (59.5%) missing valuesMissing
transaction_gt_50 has 11855 (59.5%) missing valuesMissing
transaction_day_of_week has 11855 (59.5%) missing valuesMissing
transaction_month has 11856 (59.5%) missing valuesMissing
transaction_year has 11856 (59.5%) missing valuesMissing
weekday_transaction has 11856 (59.5%) missing valuesMissing
transaction_to_original_diff has 11856 (59.5%) missing valuesMissing
currency_change has 11856 (59.5%) missing valuesMissing
transaction_freq_gt_weekly has 11856 (59.5%) missing valuesMissing
original_amount is highly skewed (γ1 = 86.56852089)Skewed

Reproduction

Analysis started2023-08-17 14:40:22.776978
Analysis finished2023-08-17 14:40:30.534365
Duration7.76 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct15941
Distinct (%)80.0%
Missing0
Missing (%)0.0%
Memory size155.8 KiB
2023-08-17T14:40:31.006020image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length324
Median length309
Mean length158.7844
Min length2

Characters and Unicode

Total characters3163938
Distinct characters79
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13758 ?
Unique (%)69.0%

Sample

1st row1003
2nd row1002
3rd row895
4th row896
5th row894
ValueCountFrequency (%)
23174
 
12.0%
recreation 11858
 
6.1%
forestry 11854
 
6.1%
and 4028
 
2.1%
stores 3432
 
1.8%
supply 2627
 
1.4%
educational 2568
 
1.3%
non-alcoholic 2273
 
1.2%
for 2233
 
1.2%
not 1174
 
0.6%
Other values (71708) 127848
66.2%
2023-08-17T14:40:32.111077image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 306324
 
9.7%
173179
 
5.5%
0 163740
 
5.2%
2 160186
 
5.1%
1 134751
 
4.3%
e 111836
 
3.5%
R 108212
 
3.4%
E 107140
 
3.4%
A 104196
 
3.3%
S 94417
 
3.0%
Other values (69) 1699957
53.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1108553
35.0%
Decimal Number 760869
24.0%
Lowercase Letter 633394
20.0%
Other Punctuation 407021
 
12.9%
Space Separator 173191
 
5.5%
Dash Punctuation 80380
 
2.5%
Open Punctuation 346
 
< 0.1%
Close Punctuation 184
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 111836
17.7%
s 79178
12.5%
a 73377
11.6%
l 65129
10.3%
r 60411
9.5%
o 35019
 
5.5%
u 33269
 
5.3%
t 31370
 
5.0%
i 22799
 
3.6%
n 19310
 
3.0%
Other values (16) 101696
16.1%
Uppercase Letter
ValueCountFrequency (%)
R 108212
 
9.8%
E 107140
 
9.7%
A 104196
 
9.4%
S 94417
 
8.5%
T 78115
 
7.0%
C 77309
 
7.0%
O 70795
 
6.4%
F 67999
 
6.1%
P 63135
 
5.7%
I 57406
 
5.2%
Other values (16) 279829
25.2%
Other Punctuation
ValueCountFrequency (%)
, 306324
75.3%
" 37927
 
9.3%
. 36170
 
8.9%
& 21322
 
5.2%
* 4380
 
1.1%
/ 707
 
0.2%
' 102
 
< 0.1%
: 70
 
< 0.1%
@ 7
 
< 0.1%
# 6
 
< 0.1%
Other values (2) 6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 163740
21.5%
2 160186
21.1%
1 134751
17.7%
3 64625
 
8.5%
5 46047
 
6.1%
7 43378
 
5.7%
4 41268
 
5.4%
8 37448
 
4.9%
9 36888
 
4.8%
6 32538
 
4.3%
Space Separator
ValueCountFrequency (%)
173179
> 99.9%
  12
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 80380
100.0%
Open Punctuation
ValueCountFrequency (%)
( 346
100.0%
Close Punctuation
ValueCountFrequency (%)
) 184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1741947
55.1%
Common 1421991
44.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 111836
 
6.4%
R 108212
 
6.2%
E 107140
 
6.2%
A 104196
 
6.0%
S 94417
 
5.4%
s 79178
 
4.5%
T 78115
 
4.5%
C 77309
 
4.4%
a 73377
 
4.2%
O 70795
 
4.1%
Other values (42) 837372
48.1%
Common
ValueCountFrequency (%)
, 306324
21.5%
173179
12.2%
0 163740
11.5%
2 160186
11.3%
1 134751
9.5%
- 80380
 
5.7%
3 64625
 
4.5%
5 46047
 
3.2%
7 43378
 
3.1%
4 41268
 
2.9%
Other values (17) 208113
14.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3163926
> 99.9%
None 12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 306324
 
9.7%
173179
 
5.5%
0 163740
 
5.2%
2 160186
 
5.1%
1 134751
 
4.3%
e 111836
 
3.5%
R 108212
 
3.4%
E 107140
 
3.4%
A 104196
 
3.3%
S 94417
 
3.0%
Other values (68) 1699945
53.7%
None
ValueCountFrequency (%)
  12
100.0%

purpose
Text

MISSING 

Distinct5431
Distinct (%)69.0%
Missing12052
Missing (%)60.5%
Memory size155.8 KiB
2023-08-17T14:40:32.716148image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length95
Median length66
Mean length19.664592
Min length1

Characters and Unicode

Total characters154839
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4650 ?
Unique (%)59.1%

Sample

1st rowBook: Canadian Urban Regions
2nd rowupdate#4 to looseleaf publication:Planning&Zoning
3rd rowLooseLeaf Publications updates
4th rowupdate to looseleaf publication inv#10288391
5th rowBook: Rapid Graphs with Tableau Software 6
ValueCountFrequency (%)
for 968
 
4.0%
more 957
 
4.0%
to 333
 
1.4%
291
 
1.2%
supplies 267
 
1.1%
and 237
 
1.0%
subscription 215
 
0.9%
tweetymail 211
 
0.9%
monthly 209
 
0.9%
max 206
 
0.9%
Other values (5092) 20164
83.8%
2023-08-17T14:40:33.329325image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17335
 
11.2%
E 14083
 
9.1%
R 10635
 
6.9%
T 9488
 
6.1%
A 9287
 
6.0%
S 9247
 
6.0%
O 8800
 
5.7%
I 8244
 
5.3%
L 6888
 
4.4%
N 6588
 
4.3%
Other values (74) 54244
35.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 122774
79.3%
Space Separator 17335
 
11.2%
Other Punctuation 5642
 
3.6%
Lowercase Letter 4638
 
3.0%
Decimal Number 3815
 
2.5%
Dash Punctuation 482
 
0.3%
Open Punctuation 71
 
< 0.1%
Close Punctuation 67
 
< 0.1%
Connector Punctuation 9
 
< 0.1%
Math Symbol 4
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 14083
11.5%
R 10635
 
8.7%
T 9488
 
7.7%
A 9287
 
7.6%
S 9247
 
7.5%
O 8800
 
7.2%
I 8244
 
6.7%
L 6888
 
5.6%
N 6588
 
5.4%
C 5521
 
4.5%
Other values (16) 33993
27.7%
Lowercase Letter
ValueCountFrequency (%)
e 556
12.0%
a 462
 
10.0%
r 415
 
8.9%
o 364
 
7.8%
t 337
 
7.3%
s 283
 
6.1%
l 283
 
6.1%
n 275
 
5.9%
i 272
 
5.9%
c 189
 
4.1%
Other values (16) 1202
25.9%
Other Punctuation
ValueCountFrequency (%)
* 3937
69.8%
, 646
 
11.4%
/ 467
 
8.3%
. 213
 
3.8%
& 157
 
2.8%
" 131
 
2.3%
' 41
 
0.7%
# 28
 
0.5%
: 15
 
0.3%
@ 4
 
0.1%
Other values (2) 3
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 804
21.1%
2 590
15.5%
0 572
15.0%
4 444
11.6%
3 337
8.8%
5 305
 
8.0%
8 269
 
7.1%
6 252
 
6.6%
7 139
 
3.6%
9 103
 
2.7%
Open Punctuation
ValueCountFrequency (%)
( 70
98.6%
[ 1
 
1.4%
Math Symbol
ValueCountFrequency (%)
+ 3
75.0%
= 1
 
25.0%
Space Separator
ValueCountFrequency (%)
17335
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 482
100.0%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%
Final Punctuation
ValueCountFrequency (%)
’ 1
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 127412
82.3%
Common 27427
 
17.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 14083
 
11.1%
R 10635
 
8.3%
T 9488
 
7.4%
A 9287
 
7.3%
S 9247
 
7.3%
O 8800
 
6.9%
I 8244
 
6.5%
L 6888
 
5.4%
N 6588
 
5.2%
C 5521
 
4.3%
Other values (42) 38631
30.3%
Common
ValueCountFrequency (%)
17335
63.2%
* 3937
 
14.4%
1 804
 
2.9%
, 646
 
2.4%
2 590
 
2.2%
0 572
 
2.1%
- 482
 
1.8%
/ 467
 
1.7%
4 444
 
1.6%
3 337
 
1.2%
Other values (22) 1813
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 154838
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17335
 
11.2%
E 14083
 
9.1%
R 10635
 
6.9%
T 9488
 
6.1%
A 9287
 
6.0%
S 9247
 
6.0%
O 8800
 
5.7%
I 8244
 
5.3%
L 6888
 
4.4%
N 6588
 
4.3%
Other values (73) 54243
35.0%
Punctuation
ValueCountFrequency (%)
’ 1
100.0%

merchant_name
Text

MISSING 

Distinct1338
Distinct (%)16.6%
Missing11854
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:33.655964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length22
Median length19
Mean length16.675917
Min length2

Characters and Unicode

Total characters134608
Distinct characters39
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique697 ?
Unique (%)8.6%

Sample

1st rowindigo books music
2nd rowcarswell
3rd rowcarswell
4th rowrei lexisnexis canada
5th rowcreatespace
ValueCountFrequency (%)
home 1260
 
5.6%
depot 1259
 
5.6%
store 857
 
3.8%
cdn 749
 
3.3%
tire 717
 
3.2%
supply 369
 
1.6%
lowes 261
 
1.2%
paypal 243
 
1.1%
auto 224
 
1.0%
tweetymail 216
 
1.0%
Other values (1953) 16328
72.6%
2023-08-17T14:40:34.745563image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14411
 
10.7%
e 12700
 
9.4%
t 9058
 
6.7%
o 8393
 
6.2%
a 8341
 
6.2%
s 7609
 
5.7%
r 6824
 
5.1%
i 6469
 
4.8%
n 6375
 
4.7%
l 5586
 
4.1%
Other values (29) 48842
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 105591
78.4%
Decimal Number 14604
 
10.8%
Space Separator 14411
 
10.7%
Uppercase Letter 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12700
12.0%
t 9058
 
8.6%
o 8393
 
7.9%
a 8341
 
7.9%
s 7609
 
7.2%
r 6824
 
6.5%
i 6469
 
6.1%
n 6375
 
6.0%
l 5586
 
5.3%
p 5389
 
5.1%
Other values (16) 28847
27.3%
Decimal Number
ValueCountFrequency (%)
0 4362
29.9%
1 2253
15.4%
7 2093
14.3%
2 1177
 
8.1%
3 1040
 
7.1%
4 902
 
6.2%
6 801
 
5.5%
5 719
 
4.9%
9 704
 
4.8%
8 553
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
T 1
50.0%
W 1
50.0%
Space Separator
ValueCountFrequency (%)
14411
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 105593
78.4%
Common 29015
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12700
12.0%
t 9058
 
8.6%
o 8393
 
7.9%
a 8341
 
7.9%
s 7609
 
7.2%
r 6824
 
6.5%
i 6469
 
6.1%
n 6375
 
6.0%
l 5586
 
5.3%
p 5389
 
5.1%
Other values (18) 28849
27.3%
Common
ValueCountFrequency (%)
14411
49.7%
0 4362
 
15.0%
1 2253
 
7.8%
7 2093
 
7.2%
2 1177
 
4.1%
3 1040
 
3.6%
4 902
 
3.1%
6 801
 
2.8%
5 719
 
2.5%
9 704
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 134608
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14411
 
10.7%
e 12700
 
9.4%
t 9058
 
6.7%
o 8393
 
6.2%
a 8341
 
6.2%
s 7609
 
5.7%
r 6824
 
5.1%
i 6469
 
4.8%
n 6375
 
4.7%
l 5586
 
4.1%
Other values (29) 48842
36.3%
Distinct185
Distinct (%)2.3%
Missing11865
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:35.072327image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length50
Median length39
Mean length33.473266
Min length9

Characters and Unicode

Total characters269828
Distinct characters43
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)0.4%

Sample

1st rowHEAD OFF-POLICY&RESR
2nd rowHEAD OFF-POLICY&RESR
3rd rowHEAD OFF-POLICY&RESR
4th rowHEAD OFF-POLICY&RESR
5th rowHEAD OFF-POLICY&RESR
ValueCountFrequency (%)
7726
 
17.6%
wtr 2085
 
4.8%
treat 2077
 
4.7%
supply 1658
 
3.8%
roadway 1447
 
3.3%
tp 1317
 
3.0%
ww 1139
 
2.6%
transmission 1128
 
2.6%
ops 1034
 
2.4%
treatmnt 948
 
2.2%
Other values (286) 23298
53.1%
2023-08-17T14:40:35.825270image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
37195
13.8%
T 23800
 
8.8%
R 21486
 
8.0%
A 20103
 
7.5%
E 19556
 
7.2%
S 19047
 
7.1%
O 12836
 
4.8%
P 11537
 
4.3%
N 10700
 
4.0%
W 9718
 
3.6%
Other values (33) 83850
31.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 210820
78.1%
Space Separator 37195
 
13.8%
Other Punctuation 12255
 
4.5%
Dash Punctuation 5951
 
2.2%
Decimal Number 3607
 
1.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 23800
11.3%
R 21486
10.2%
A 20103
 
9.5%
E 19556
 
9.3%
S 19047
 
9.0%
O 12836
 
6.1%
P 11537
 
5.5%
N 10700
 
5.1%
W 9718
 
4.6%
I 9697
 
4.6%
Other values (16) 52340
24.8%
Decimal Number
ValueCountFrequency (%)
1 2043
56.6%
4 698
 
19.4%
2 432
 
12.0%
3 418
 
11.6%
0 7
 
0.2%
6 4
 
0.1%
5 2
 
0.1%
7 2
 
0.1%
8 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
& 5491
44.8%
: 5279
43.1%
, 1009
 
8.2%
/ 279
 
2.3%
. 159
 
1.3%
' 38
 
0.3%
Space Separator
ValueCountFrequency (%)
37195
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5951
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 210820
78.1%
Common 59008
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 23800
11.3%
R 21486
10.2%
A 20103
 
9.5%
E 19556
 
9.3%
S 19047
 
9.0%
O 12836
 
6.1%
P 11537
 
5.5%
N 10700
 
5.1%
W 9718
 
4.6%
I 9697
 
4.6%
Other values (16) 52340
24.8%
Common
ValueCountFrequency (%)
37195
63.0%
- 5951
 
10.1%
& 5491
 
9.3%
: 5279
 
8.9%
1 2043
 
3.5%
, 1009
 
1.7%
4 698
 
1.2%
2 432
 
0.7%
3 418
 
0.7%
/ 279
 
0.5%
Other values (7) 213
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 269828
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37195
13.8%
T 23800
 
8.8%
R 21486
 
8.0%
A 20103
 
7.5%
E 19556
 
7.2%
S 19047
 
7.1%
O 12836
 
4.8%
P 11537
 
4.3%
N 10700
 
4.0%
W 9718
 
3.6%
Other values (33) 83850
31.1%

card_posting_date
Text

MISSING 

Distinct1436
Distinct (%)17.8%
Missing11854
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:36.352958image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.9993806
Min length5

Characters and Unicode

Total characters80715
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique555 ?
Unique (%)6.9%

Sample

1st row2011-06-27
2nd row2011-06-08
3rd row2011-05-24
4th row2011-05-24
5th row2011-05-16
ValueCountFrequency (%)
2018-04-26 38
 
0.5%
2017-05-23 38
 
0.5%
2018-09-20 38
 
0.5%
2018-05-03 37
 
0.5%
2018-01-11 36
 
0.4%
2017-10-02 34
 
0.4%
2017-05-29 34
 
0.4%
2018-05-17 33
 
0.4%
2018-04-05 32
 
0.4%
2018-01-10 32
 
0.4%
Other values (1426) 7720
95.6%
2023-08-17T14:40:37.361727image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 17780
22.0%
- 16142
20.0%
1 15680
19.4%
2 13069
16.2%
8 5194
 
6.4%
7 4054
 
5.0%
6 1996
 
2.5%
3 1924
 
2.4%
5 1886
 
2.3%
4 1594
 
2.0%
Other values (2) 1396
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 64572
80.0%
Dash Punctuation 16142
 
20.0%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 17780
27.5%
1 15680
24.3%
2 13069
20.2%
8 5194
 
8.0%
7 4054
 
6.3%
6 1996
 
3.1%
3 1924
 
3.0%
5 1886
 
2.9%
4 1594
 
2.5%
9 1395
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 16142
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 17780
22.0%
- 16142
20.0%
1 15680
19.4%
2 13069
16.2%
8 5194
 
6.4%
7 4054
 
5.0%
6 1996
 
2.5%
3 1924
 
2.4%
5 1886
 
2.3%
4 1594
 
2.0%
Other values (2) 1396
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 17780
22.0%
- 16142
20.0%
1 15680
19.4%
2 13069
16.2%
8 5194
 
6.4%
7 4054
 
5.0%
6 1996
 
2.5%
3 1924
 
2.4%
5 1886
 
2.3%
4 1594
 
2.0%
Other values (2) 1396
 
1.7%

merchant_type_mcc
Text

MISSING 

Distinct164
Distinct (%)2.0%
Missing11854
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:37.798990image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.9982656
Min length3

Characters and Unicode

Total characters48418
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.4%

Sample

1st row5399.0
2nd row7338.0
3rd row7338.0
4th row5942.0
5th row7829.0
ValueCountFrequency (%)
5200.0 2235
27.7%
5085.0 843
 
10.4%
5251.0 567
 
7.0%
9399.0 298
 
3.7%
5211.0 284
 
3.5%
7372.0 235
 
2.9%
5943.0 190
 
2.4%
5732.0 181
 
2.2%
5261.0 175
 
2.2%
4812.0 155
 
1.9%
Other values (154) 2909
36.0%
2023-08-17T14:40:38.597084image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 14272
29.5%
5 8102
16.7%
. 8071
16.7%
2 4768
 
9.8%
1 3042
 
6.3%
9 2541
 
5.2%
3 2076
 
4.3%
7 1744
 
3.6%
8 1643
 
3.4%
4 1303
 
2.7%
Other values (4) 856
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40344
83.3%
Other Punctuation 8071
 
16.7%
Uppercase Letter 3
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 14272
35.4%
5 8102
20.1%
2 4768
 
11.8%
1 3042
 
7.5%
9 2541
 
6.3%
3 2076
 
5.1%
7 1744
 
4.3%
8 1643
 
4.1%
4 1303
 
3.2%
6 853
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
C 1
33.3%
A 1
33.3%
D 1
33.3%
Other Punctuation
ValueCountFrequency (%)
. 8071
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 48415
> 99.9%
Latin 3
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 14272
29.5%
5 8102
16.7%
. 8071
16.7%
2 4768
 
9.8%
1 3042
 
6.3%
9 2541
 
5.2%
3 2076
 
4.3%
7 1744
 
3.6%
8 1643
 
3.4%
4 1303
 
2.7%
Latin
ValueCountFrequency (%)
C 1
33.3%
A 1
33.3%
D 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48418
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 14272
29.5%
5 8102
16.7%
. 8071
16.7%
2 4768
 
9.8%
1 3042
 
6.3%
9 2541
 
5.2%
3 2076
 
4.3%
7 1744
 
3.6%
8 1643
 
3.4%
4 1303
 
2.7%
Other values (4) 856
 
1.8%
Distinct164
Distinct (%)2.0%
Missing11854
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:39.164053image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length257
Median length40
Mean length29.34502
Min length5

Characters and Unicode

Total characters236873
Distinct characters68
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.4%

Sample

1st rowMiscellaneous General Merchandise
2nd rowQuick Copy, Reproduction, and Blueprinti
3rd rowQuick Copy, Reproduction, and Blueprinti
4th rowBook Stores
5th rowMotion Picture and Video Tape Production
ValueCountFrequency (%)
supply 2629
 
8.4%
home 2309
 
7.4%
warehouse 2235
 
7.1%
stores 1633
 
5.2%
not 1512
 
4.8%
elsewhere 1465
 
4.7%
and 1452
 
4.6%
supplies 977
 
3.1%
classi 918
 
2.9%
industrial 843
 
2.7%
Other values (368) 15299
48.9%
2023-08-17T14:40:39.753116image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 26935
 
11.4%
23203
 
9.8%
r 15835
 
6.7%
s 14997
 
6.3%
a 13642
 
5.8%
o 13204
 
5.6%
l 11677
 
4.9%
t 11504
 
4.9%
i 11385
 
4.8%
u 10135
 
4.3%
Other values (58) 84356
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 180751
76.3%
Uppercase Letter 29210
 
12.3%
Space Separator 23203
 
9.8%
Other Punctuation 2747
 
1.2%
Dash Punctuation 822
 
0.3%
Open Punctuation 82
 
< 0.1%
Decimal Number 56
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 26935
14.9%
r 15835
 
8.8%
s 14997
 
8.3%
a 13642
 
7.5%
o 13204
 
7.3%
l 11677
 
6.5%
t 11504
 
6.4%
i 11385
 
6.3%
u 10135
 
5.6%
p 9214
 
5.1%
Other values (15) 42223
23.4%
Uppercase Letter
ValueCountFrequency (%)
S 8154
27.9%
H 3064
 
10.5%
C 2879
 
9.9%
E 2559
 
8.8%
W 2491
 
8.5%
N 1650
 
5.6%
I 1374
 
4.7%
P 1034
 
3.5%
G 988
 
3.4%
M 880
 
3.0%
Other values (14) 4137
14.2%
Decimal Number
ValueCountFrequency (%)
2 16
28.6%
1 13
23.2%
0 11
19.6%
4 5
 
8.9%
5 3
 
5.4%
7 3
 
5.4%
8 3
 
5.4%
6 1
 
1.8%
3 1
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 2587
94.2%
/ 130
 
4.7%
& 21
 
0.8%
" 4
 
0.1%
. 3
 
0.1%
' 2
 
0.1%
Space Separator
ValueCountFrequency (%)
23203
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 822
100.0%
Open Punctuation
ValueCountFrequency (%)
( 82
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 209961
88.6%
Common 26912
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 26935
 
12.8%
r 15835
 
7.5%
s 14997
 
7.1%
a 13642
 
6.5%
o 13204
 
6.3%
l 11677
 
5.6%
t 11504
 
5.5%
i 11385
 
5.4%
u 10135
 
4.8%
p 9214
 
4.4%
Other values (39) 71433
34.0%
Common
ValueCountFrequency (%)
23203
86.2%
, 2587
 
9.6%
- 822
 
3.1%
/ 130
 
0.5%
( 82
 
0.3%
& 21
 
0.1%
2 16
 
0.1%
1 13
 
< 0.1%
0 11
 
< 0.1%
4 5
 
< 0.1%
Other values (9) 22
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 236873
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 26935
 
11.4%
23203
 
9.8%
r 15835
 
6.7%
s 14997
 
6.3%
a 13642
 
5.8%
o 13204
 
5.6%
l 11677
 
4.9%
t 11504
 
4.9%
i 11385
 
4.8%
u 10135
 
4.3%
Other values (58) 84356
35.6%

original_currency1
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct5
Distinct (%)0.1%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
CAD
7892 
USD
 
175
CHF
 
2
GBP
 
1
34.23
 
1

Length

Max length5
Median length3
Mean length3.0002478
Min length3

Characters and Unicode

Total characters24215
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowCAD
2nd rowCAD
3rd rowCAD
4th rowCAD
5th rowUSD

Common Values

ValueCountFrequency (%)
CAD 7892
39.6%
USD 175
 
0.9%
CHF 2
 
< 0.1%
GBP 1
 
< 0.1%
34.23 1
 
< 0.1%
(Missing) 11855
59.5%

Length

2023-08-17T14:40:39.973433image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-17T14:40:40.162276image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
cad 7892
97.8%
usd 175
 
2.2%
chf 2
 
< 0.1%
gbp 1
 
< 0.1%
34.23 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
D 8067
33.3%
C 7894
32.6%
A 7892
32.6%
U 175
 
0.7%
S 175
 
0.7%
H 2
 
< 0.1%
F 2
 
< 0.1%
3 2
 
< 0.1%
G 1
 
< 0.1%
B 1
 
< 0.1%
Other values (4) 4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 24210
> 99.9%
Decimal Number 4
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D 8067
33.3%
C 7894
32.6%
A 7892
32.6%
U 175
 
0.7%
S 175
 
0.7%
H 2
 
< 0.1%
F 2
 
< 0.1%
G 1
 
< 0.1%
B 1
 
< 0.1%
P 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
3 2
50.0%
4 1
25.0%
2 1
25.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24210
> 99.9%
Common 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 8067
33.3%
C 7894
32.6%
A 7892
32.6%
U 175
 
0.7%
S 175
 
0.7%
H 2
 
< 0.1%
F 2
 
< 0.1%
G 1
 
< 0.1%
B 1
 
< 0.1%
P 1
 
< 0.1%
Common
ValueCountFrequency (%)
3 2
40.0%
4 1
20.0%
. 1
20.0%
2 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24215
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 8067
33.3%
C 7894
32.6%
A 7892
32.6%
U 175
 
0.7%
S 175
 
0.7%
H 2
 
< 0.1%
F 2
 
< 0.1%
3 2
 
< 0.1%
G 1
 
< 0.1%
B 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
Distinct133
Distinct (%)1.6%
Missing11857
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:40.472584image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length13
Median length6
Mean length6.039534
Min length5

Characters and Unicode

Total characters48733
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)0.3%

Sample

1st rowUR0005
2nd rowUR0005
3rd rowUR0005
4th rowUR0005
5th rowUR0005
ValueCountFrequency (%)
tw7070 952
 
11.8%
tp0219 582
 
7.2%
tw7072 425
 
5.3%
tw2040 335
 
4.2%
tw2060 329
 
4.1%
tw4080 318
 
3.9%
tp0333 296
 
3.7%
tw2035 292
 
3.6%
tp0124 274
 
3.4%
tw7075 252
 
3.1%
Other values (124) 4015
49.8%
2023-08-17T14:40:40.985653image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 12169
25.0%
T 7958
16.3%
W 5350
11.0%
7 4087
 
8.4%
2 3729
 
7.7%
P 2777
 
5.7%
1 2549
 
5.2%
4 2385
 
4.9%
5 2262
 
4.6%
3 1858
 
3.8%
Other values (13) 3609
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 32377
66.4%
Uppercase Letter 16235
33.3%
Dash Punctuation 105
 
0.2%
Other Punctuation 15
 
< 0.1%
Space Separator 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 12169
37.6%
7 4087
 
12.6%
2 3729
 
11.5%
1 2549
 
7.9%
4 2385
 
7.4%
5 2262
 
7.0%
3 1858
 
5.7%
6 1551
 
4.8%
9 1184
 
3.7%
8 603
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
T 7958
49.0%
W 5350
33.0%
P 2777
 
17.1%
C 91
 
0.6%
R 29
 
0.2%
U 21
 
0.1%
N 4
 
< 0.1%
O 3
 
< 0.1%
A 1
 
< 0.1%
E 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 105
100.0%
Other Punctuation
ValueCountFrequency (%)
* 15
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 32498
66.7%
Latin 16235
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 12169
37.4%
7 4087
 
12.6%
2 3729
 
11.5%
1 2549
 
7.8%
4 2385
 
7.3%
5 2262
 
7.0%
3 1858
 
5.7%
6 1551
 
4.8%
9 1184
 
3.6%
8 603
 
1.9%
Other values (3) 121
 
0.4%
Latin
ValueCountFrequency (%)
T 7958
49.0%
W 5350
33.0%
P 2777
 
17.1%
C 91
 
0.6%
R 29
 
0.2%
U 21
 
0.1%
N 4
 
< 0.1%
O 3
 
< 0.1%
A 1
 
< 0.1%
E 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 12169
25.0%
T 7958
16.3%
W 5350
11.0%
7 4087
 
8.4%
2 3729
 
7.7%
P 2777
 
5.7%
1 2549
 
5.2%
4 2385
 
4.9%
5 2262
 
4.6%
3 1858
 
3.8%
Other values (13) 3609
 
7.4%

transaction_date
Text

MISSING 

Distinct1523
Distinct (%)18.9%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:41.346544image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.9992566
Min length4

Characters and Unicode

Total characters80704
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique618 ?
Unique (%)7.7%

Sample

1st row2011-06-26
2nd row2011-06-07
3rd row2011-05-20
4th row2011-05-20
5th row2011-05-14
ValueCountFrequency (%)
2018-01-09 32
 
0.4%
2017-07-11 32
 
0.4%
2018-01-05 30
 
0.4%
2017-06-05 29
 
0.4%
2017-11-14 29
 
0.4%
2018-09-20 28
 
0.3%
2018-03-14 28
 
0.3%
2017-06-14 27
 
0.3%
2017-08-17 26
 
0.3%
2018-02-02 26
 
0.3%
Other values (1513) 7784
96.4%
2023-08-17T14:40:41.843065image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 17909
22.2%
- 16140
20.0%
1 15638
19.4%
2 13015
16.1%
8 5162
 
6.4%
7 4118
 
5.1%
6 1970
 
2.4%
5 1846
 
2.3%
3 1814
 
2.2%
4 1632
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 64564
80.0%
Dash Punctuation 16140
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 17909
27.7%
1 15638
24.2%
2 13015
20.2%
8 5162
 
8.0%
7 4118
 
6.4%
6 1970
 
3.1%
5 1846
 
2.9%
3 1814
 
2.8%
4 1632
 
2.5%
9 1460
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
- 16140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80704
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 17909
22.2%
- 16140
20.0%
1 15638
19.4%
2 13015
16.1%
8 5162
 
6.4%
7 4118
 
5.1%
6 1970
 
2.4%
5 1846
 
2.3%
3 1814
 
2.2%
4 1632
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 80704
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 17909
22.2%
- 16140
20.0%
1 15638
19.4%
2 13015
16.1%
8 5162
 
6.4%
7 4118
 
5.1%
6 1970
 
2.4%
5 1846
 
2.3%
3 1814
 
2.2%
4 1632
 
2.0%

transaction_amount
Text

MISSING 

Distinct5653
Distinct (%)70.0%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:42.271566image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length8
Median length7
Mean length5.3402305
Min length3

Characters and Unicode

Total characters43101
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4640 ?
Unique (%)57.5%

Sample

1st row60.85
2nd row273.45
3rd row281.85
4th row91.3
5th row45.82
ValueCountFrequency (%)
145.0 56
 
0.7%
175.0 42
 
0.5%
50.0 35
 
0.4%
140.0 29
 
0.4%
40.0 26
 
0.3%
33.89 24
 
0.3%
95.31 21
 
0.3%
45.19 18
 
0.2%
248.6 18
 
0.2%
56.49 18
 
0.2%
Other values (5643) 7784
96.4%
2023-08-17T14:40:42.863269image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 8070
18.7%
1 5132
11.9%
2 4149
9.6%
5 3718
8.6%
4 3593
8.3%
3 3584
8.3%
6 3233
7.5%
7 3054
 
7.1%
8 3046
 
7.1%
9 3025
 
7.0%
Other values (2) 2497
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 35030
81.3%
Other Punctuation 8070
 
18.7%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 5132
14.7%
2 4149
11.8%
5 3718
10.6%
4 3593
10.3%
3 3584
10.2%
6 3233
9.2%
7 3054
8.7%
8 3046
8.7%
9 3025
8.6%
0 2496
7.1%
Other Punctuation
ValueCountFrequency (%)
. 8070
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 43101
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 8070
18.7%
1 5132
11.9%
2 4149
9.6%
5 3718
8.6%
4 3593
8.3%
3 3584
8.3%
6 3233
7.5%
7 3054
 
7.1%
8 3046
 
7.1%
9 3025
 
7.0%
Other values (2) 2497
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43101
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 8070
18.7%
1 5132
11.9%
2 4149
9.6%
5 3718
8.6%
4 3593
8.3%
3 3584
8.3%
6 3233
7.5%
7 3054
 
7.1%
8 3046
 
7.1%
9 3025
 
7.0%
Other values (2) 2497
 
5.8%

trx_currency
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
CAD
8070 
False
 
1

Length

Max length5
Median length3
Mean length3.0002478
Min length3

Characters and Unicode

Total characters24215
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCAD
2nd rowCAD
3rd rowCAD
4th rowCAD
5th rowCAD

Common Values

ValueCountFrequency (%)
CAD 8070
40.5%
False 1
 
< 0.1%
(Missing) 11855
59.5%

Length

2023-08-17T14:40:43.074150image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-17T14:40:43.258337image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
cad 8070
> 99.9%
false 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
C 8070
33.3%
A 8070
33.3%
D 8070
33.3%
F 1
 
< 0.1%
a 1
 
< 0.1%
l 1
 
< 0.1%
s 1
 
< 0.1%
e 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 24211
> 99.9%
Lowercase Letter 4
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 8070
33.3%
A 8070
33.3%
D 8070
33.3%
F 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
l 1
25.0%
s 1
25.0%
e 1
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24215
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 8070
33.3%
A 8070
33.3%
D 8070
33.3%
F 1
 
< 0.1%
a 1
 
< 0.1%
l 1
 
< 0.1%
s 1
 
< 0.1%
e 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24215
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 8070
33.3%
A 8070
33.3%
D 8070
33.3%
F 1
 
< 0.1%
a 1
 
< 0.1%
l 1
 
< 0.1%
s 1
 
< 0.1%
e 1
 
< 0.1%
Distinct165
Distinct (%)2.0%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:43.538627image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length40
Median length38
Mean length21.514434
Min length1

Characters and Unicode

Total characters173643
Distinct characters36
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)0.4%

Sample

1st rowBOOK & MAGAZINE SUBSCRIPTIONS
2nd rowBOOK & MAGAZINE SUBSCRIPTIONS
3rd rowBOOK & MAGAZINE SUBSCRIPTIONS
4th rowBOOK & MAGAZINE SUBSCRIPTIONS
5th rowBOOK & MAGAZINE SUBSCRIPTIONS
ValueCountFrequency (%)
4523
17.9%
supplies 2113
 
8.4%
general 1814
 
7.2%
hardware 1749
 
6.9%
materials 1147
 
4.5%
equipment 838
 
3.3%
parts 689
 
2.7%
machinery 670
 
2.7%
miscellaneous 646
 
2.6%
m 495
 
2.0%
Other values (238) 10586
41.9%
2023-08-17T14:40:44.113110image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 22000
12.7%
17199
 
9.9%
A 14461
 
8.3%
R 13710
 
7.9%
S 12580
 
7.2%
I 11834
 
6.8%
L 9812
 
5.7%
N 9696
 
5.6%
P 7881
 
4.5%
T 7566
 
4.4%
Other values (26) 46904
27.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 151503
87.2%
Space Separator 17199
 
9.9%
Other Punctuation 3028
 
1.7%
Dash Punctuation 1858
 
1.1%
Open Punctuation 27
 
< 0.1%
Close Punctuation 27
 
< 0.1%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 22000
14.5%
A 14461
 
9.5%
R 13710
 
9.0%
S 12580
 
8.3%
I 11834
 
7.8%
L 9812
 
6.5%
N 9696
 
6.4%
P 7881
 
5.2%
T 7566
 
5.0%
M 5483
 
3.6%
Other values (16) 36480
24.1%
Other Punctuation
ValueCountFrequency (%)
& 2845
94.0%
/ 126
 
4.2%
, 36
 
1.2%
. 20
 
0.7%
# 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
17199
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1858
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27
100.0%
Decimal Number
ValueCountFrequency (%)
0 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 151503
87.2%
Common 22140
 
12.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 22000
14.5%
A 14461
 
9.5%
R 13710
 
9.0%
S 12580
 
8.3%
I 11834
 
7.8%
L 9812
 
6.5%
N 9696
 
6.4%
P 7881
 
5.2%
T 7566
 
5.0%
M 5483
 
3.6%
Other values (16) 36480
24.1%
Common
ValueCountFrequency (%)
17199
77.7%
& 2845
 
12.9%
- 1858
 
8.4%
/ 126
 
0.6%
, 36
 
0.2%
( 27
 
0.1%
) 27
 
0.1%
. 20
 
0.1%
0 1
 
< 0.1%
# 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 173643
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 22000
12.7%
17199
 
9.9%
A 14461
 
8.3%
R 13710
 
7.9%
S 12580
 
7.2%
I 11834
 
6.8%
L 9812
 
5.7%
N 9696
 
5.6%
P 7881
 
4.5%
T 7566
 
4.4%
Other values (26) 46904
27.0%

original_amount
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct5607
Distinct (%)69.5%
Missing11855
Missing (%)59.5%
Infinite0
Infinite (%)0.0%
Mean250.9653
Minimum0.01
Maximum201807
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.8 KiB
2023-08-17T14:40:44.326082image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile8.395
Q141.595
median107.33
Q3248.6
95-th percentile860.915
Maximum201807
Range201806.99
Interquartile range (IQR)207.005

Descriptive statistics

Standard deviation2271.8371
Coefficient of variation (CV)9.0523954
Kurtosis7679.8877
Mean250.9653
Median Absolute Deviation (MAD)77.63
Skewness86.568521
Sum2025540.9
Variance5161243.9
MonotonicityNot monotonic
2023-08-17T14:40:44.514127image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
145 56
 
0.3%
50 48
 
0.2%
175 44
 
0.2%
140 29
 
0.1%
40 28
 
0.1%
33.89 24
 
0.1%
95.31 21
 
0.1%
45.19 18
 
0.1%
248.6 18
 
0.1%
56.49 18
 
0.1%
Other values (5597) 7767
39.0%
(Missing) 11855
59.5%
ValueCountFrequency (%)
0.01 1
< 0.1%
0.49 1
< 0.1%
0.66 1
< 0.1%
0.78 1
< 0.1%
0.86 1
< 0.1%
0.96 1
< 0.1%
1.07 1
< 0.1%
1.2 1
< 0.1%
1.23 1
< 0.1%
1.32 1
< 0.1%
ValueCountFrequency (%)
201807 1
< 0.1%
4013.52 1
< 0.1%
3288.58 1
< 0.1%
2998.17 1
< 0.1%
2988.75 1
< 0.1%
2959.72 1
< 0.1%
2956.36 1
< 0.1%
2948.74 2
< 0.1%
2943.65 1
< 0.1%
2932.35 1
< 0.1%

division
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct6
Distinct (%)0.1%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
TORONTO WATER
5277 
TRANSPORTATION SERVICES
2234 
TRANSPORTATION
532 
URBAN PLANNING
 
21
TREASURER
 
6

Length

Max length24
Median length13
Mean length16.109156
Min length4

Characters and Unicode

Total characters130017
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowURBAN PLANNING
2nd rowURBAN PLANNING
3rd rowURBAN PLANNING
4th rowURBAN PLANNING
5th rowURBAN PLANNING

Common Values

ValueCountFrequency (%)
TORONTO WATER 5277
26.5%
TRANSPORTATION SERVICES 2234
 
11.2%
TRANSPORTATION 532
 
2.7%
URBAN PLANNING 21
 
0.1%
TREASURER 6
 
< 0.1%
2018 1
 
< 0.1%
(Missing) 11855
59.5%

Length

2023-08-17T14:40:44.703756image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-17T14:40:44.911819image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
toronto 5277
33.8%
water 5277
33.8%
transportation 2766
17.7%
services 2234
14.3%
urban 21
 
0.1%
planning 21
 
0.1%
treasurer 6
 
< 0.1%
2018 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
T 24135
18.6%
O 21363
16.4%
R 18359
14.1%
N 10893
8.4%
A 10857
8.4%
9766
7.5%
E 9757
7.5%
S 7240
 
5.6%
W 5277
 
4.1%
I 5021
 
3.9%
Other values (11) 7349
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 120247
92.5%
Space Separator 9766
 
7.5%
Decimal Number 4
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 24135
20.1%
O 21363
17.8%
R 18359
15.3%
N 10893
9.1%
A 10857
9.0%
E 9757
8.1%
S 7240
 
6.0%
W 5277
 
4.4%
I 5021
 
4.2%
P 2787
 
2.3%
Other values (6) 4558
 
3.8%
Decimal Number
ValueCountFrequency (%)
2 1
25.0%
0 1
25.0%
1 1
25.0%
8 1
25.0%
Space Separator
ValueCountFrequency (%)
9766
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 120247
92.5%
Common 9770
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 24135
20.1%
O 21363
17.8%
R 18359
15.3%
N 10893
9.1%
A 10857
9.0%
E 9757
8.1%
S 7240
 
6.0%
W 5277
 
4.4%
I 5021
 
4.2%
P 2787
 
2.3%
Other values (6) 4558
 
3.8%
Common
ValueCountFrequency (%)
9766
> 99.9%
2 1
 
< 0.1%
0 1
 
< 0.1%
1 1
 
< 0.1%
8 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 130017
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 24135
18.6%
O 21363
16.4%
R 18359
14.1%
N 10893
8.4%
A 10857
8.4%
9766
7.5%
E 9757
7.5%
S 7240
 
5.6%
W 5277
 
4.1%
I 5021
 
3.9%
Other values (11) 7349
 
5.7%

gl_account
Text

MISSING 

Distinct214
Distinct (%)2.7%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:45.259805image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.1058109
Min length4

Characters and Unicode

Total characters33138
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)0.7%

Sample

1st row2020.0
2nd row2020.0
3rd row2020.0
4th row2020.0
5th row2020.0
ValueCountFrequency (%)
2710 1563
19.4%
2120 594
 
7.4%
2999 567
 
7.0%
2552 419
 
5.2%
3080 313
 
3.9%
2535 302
 
3.7%
2530 273
 
3.4%
2575 243
 
3.0%
216
 
2.7%
2715 192
 
2.4%
Other values (204) 3389
42.0%
2023-08-17T14:40:45.819608image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 7238
21.8%
0 5901
17.8%
1 3631
11.0%
5 3480
10.5%
9 2814
 
8.5%
4 2592
 
7.8%
7 2501
 
7.5%
3 1969
 
5.9%
* 1080
 
3.3%
8 1036
 
3.1%
Other values (6) 896
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 31735
95.8%
Other Punctuation 1399
 
4.2%
Lowercase Letter 3
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 7238
22.8%
0 5901
18.6%
1 3631
11.4%
5 3480
11.0%
9 2814
 
8.9%
4 2592
 
8.2%
7 2501
 
7.9%
3 1969
 
6.2%
8 1036
 
3.3%
6 573
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
r 1
33.3%
u 1
33.3%
e 1
33.3%
Other Punctuation
ValueCountFrequency (%)
* 1080
77.2%
. 319
 
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 33134
> 99.9%
Latin 4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2 7238
21.8%
0 5901
17.8%
1 3631
11.0%
5 3480
10.5%
9 2814
 
8.5%
4 2592
 
7.8%
7 2501
 
7.5%
3 1969
 
5.9%
* 1080
 
3.3%
8 1036
 
3.1%
Other values (2) 892
 
2.7%
Latin
ValueCountFrequency (%)
T 1
25.0%
r 1
25.0%
u 1
25.0%
e 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 7238
21.8%
0 5901
17.8%
1 3631
11.0%
5 3480
10.5%
9 2814
 
8.5%
4 2592
 
7.8%
7 2501
 
7.5%
3 1969
 
5.9%
* 1080
 
3.3%
8 1036
 
3.1%
Other values (6) 896
 
2.7%

batch_transaction_id
Text

MISSING 

Distinct8063
Distinct (%)99.9%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
2023-08-17T14:40:46.203634image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.441581
Min length5

Characters and Unicode

Total characters60061
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8055 ?
Unique (%)99.8%

Sample

1st row1521-44
2nd row1494-34
3rd row1471-36
4th row1471-37
5th row1461-33
ValueCountFrequency (%)
5020-56 2
 
< 0.1%
5020-46 2
 
< 0.1%
2481-8 2
 
< 0.1%
2560-310 2
 
< 0.1%
4228-237 2
 
< 0.1%
4228-81 2
 
< 0.1%
4228-139 2
 
< 0.1%
5021-121 2
 
< 0.1%
1365-34 1
 
< 0.1%
1329-39 1
 
< 0.1%
Other values (8053) 8053
99.8%
2023-08-17T14:40:46.741789image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4 8348
13.9%
- 8070
13.4%
1 6864
11.4%
5 5615
9.3%
2 4916
8.2%
9 4708
7.8%
3 4560
7.6%
0 4397
7.3%
7 4302
7.2%
8 4273
7.1%
Other values (6) 4008
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 51986
86.6%
Dash Punctuation 8070
 
13.4%
Lowercase Letter 4
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 8348
16.1%
1 6864
13.2%
5 5615
10.8%
2 4916
9.5%
9 4708
9.1%
3 4560
8.8%
0 4397
8.5%
7 4302
8.3%
8 4273
8.2%
6 4003
7.7%
Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
l 1
25.0%
s 1
25.0%
e 1
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 8070
100.0%
Uppercase Letter
ValueCountFrequency (%)
F 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60056
> 99.9%
Latin 5
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
4 8348
13.9%
- 8070
13.4%
1 6864
11.4%
5 5615
9.3%
2 4916
8.2%
9 4708
7.8%
3 4560
7.6%
0 4397
7.3%
7 4302
7.2%
8 4273
7.1%
Latin
ValueCountFrequency (%)
F 1
20.0%
a 1
20.0%
l 1
20.0%
s 1
20.0%
e 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60061
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 8348
13.9%
- 8070
13.4%
1 6864
11.4%
5 5615
9.3%
2 4916
8.2%
9 4708
7.8%
3 4560
7.6%
0 4397
7.3%
7 4302
7.2%
8 4273
7.1%
Other values (6) 4008
6.7%

transaction_gt_50
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
True
4207 
False
3864 
(Missing)
11855 
ValueCountFrequency (%)
True 4207
 
21.1%
False 3864
 
19.4%
(Missing) 11855
59.5%
2023-08-17T14:40:47.059325image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

transaction_day_of_week
Categorical

HIGH CORRELATION  MISSING 

Distinct8
Distinct (%)0.1%
Missing11855
Missing (%)59.5%
Memory size155.8 KiB
3
1694 
1
1684 
2
1676 
4
1489 
0
1227 
Other values (3)
301 

Length

Max length5
Median length1
Mean length1.0004956
Min length1

Characters and Unicode

Total characters8075
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row6
2nd row1
3rd row4
4th row4
5th row5

Common Values

ValueCountFrequency (%)
3 1694
 
8.5%
1 1684
 
8.5%
2 1676
 
8.4%
4 1489
 
7.5%
0 1227
 
6.2%
5 184
 
0.9%
6 116
 
0.6%
False 1
 
< 0.1%
(Missing) 11855
59.5%

Length

2023-08-17T14:40:47.214786image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-17T14:40:47.701282image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 1694
21.0%
1 1684
20.9%
2 1676
20.8%
4 1489
18.4%
0 1227
15.2%
5 184
 
2.3%
6 116
 
1.4%
false 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
3 1694
21.0%
1 1684
20.9%
2 1676
20.8%
4 1489
18.4%
0 1227
15.2%
5 184
 
2.3%
6 116
 
1.4%
F 1
 
< 0.1%
a 1
 
< 0.1%
l 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8070
99.9%
Lowercase Letter 4
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 1694
21.0%
1 1684
20.9%
2 1676
20.8%
4 1489
18.5%
0 1227
15.2%
5 184
 
2.3%
6 116
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
a 1
25.0%
l 1
25.0%
s 1
25.0%
e 1
25.0%
Uppercase Letter
ValueCountFrequency (%)
F 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8070
99.9%
Latin 5
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 1694
21.0%
1 1684
20.9%
2 1676
20.8%
4 1489
18.5%
0 1227
15.2%
5 184
 
2.3%
6 116
 
1.4%
Latin
ValueCountFrequency (%)
F 1
20.0%
a 1
20.0%
l 1
20.0%
s 1
20.0%
e 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8075
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 1694
21.0%
1 1684
20.9%
2 1676
20.8%
4 1489
18.4%
0 1227
15.2%
5 184
 
2.3%
6 116
 
1.4%
F 1
 
< 0.1%
a 1
 
< 0.1%
l 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

transaction_month
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct96
Distinct (%)1.2%
Missing11856
Missing (%)59.5%
Infinite0
Infinite (%)0.0%
Mean201678.24
Minimum201101
Maximum201812
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.8 KiB
2023-08-17T14:40:47.901793image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum201101
5-th percentile201112
Q1201705
median201711
Q3201805
95-th percentile201811
Maximum201812
Range711
Interquartile range (IQR)100

Descriptive statistics

Standard deviation194.13604
Coefficient of variation (CV)0.00096260281
Kurtosis2.4483956
Mean201678.24
Median Absolute Deviation (MAD)93
Skewness-1.869686
Sum1.6275434 × 109
Variance37688.803
MonotonicityNot monotonic
2023-08-17T14:40:48.120959image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201801 427
 
2.1%
201706 392
 
2.0%
201805 384
 
1.9%
201804 369
 
1.9%
201711 337
 
1.7%
201803 329
 
1.7%
201708 319
 
1.6%
201710 318
 
1.6%
201707 313
 
1.6%
201802 311
 
1.6%
Other values (86) 4571
 
22.9%
(Missing) 11856
59.5%
ValueCountFrequency (%)
201101 59
0.3%
201102 39
0.2%
201103 66
0.3%
201104 55
0.3%
201105 58
0.3%
201106 43
0.2%
201107 16
 
0.1%
201108 16
 
0.1%
201109 13
 
0.1%
201110 10
 
0.1%
ValueCountFrequency (%)
201812 193
1.0%
201811 304
1.5%
201810 289
1.5%
201809 299
1.5%
201808 290
1.5%
201807 273
1.4%
201806 265
1.3%
201805 384
1.9%
201804 369
1.9%
201803 329
1.7%

transaction_year
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct8
Distinct (%)0.1%
Missing11856
Missing (%)59.5%
Infinite0
Infinite (%)0.0%
Mean2016.7154
Minimum2011
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.8 KiB
2023-08-17T14:40:48.291418image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2011
5-th percentile2011
Q12017
median2017
Q32018
95-th percentile2018
Maximum2018
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.9386167
Coefficient of variation (CV)0.00096127431
Kurtosis2.4164105
Mean2016.7154
Median Absolute Deviation (MAD)1
Skewness-1.858976
Sum16274893
Variance3.7582346
MonotonicityNot monotonic
2023-08-17T14:40:48.434692image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
2018 3733
 
18.7%
2017 2607
 
13.1%
2011 410
 
2.1%
2016 375
 
1.9%
2015 288
 
1.4%
2014 264
 
1.3%
2012 255
 
1.3%
2013 138
 
0.7%
(Missing) 11856
59.5%
ValueCountFrequency (%)
2011 410
 
2.1%
2012 255
 
1.3%
2013 138
 
0.7%
2014 264
 
1.3%
2015 288
 
1.4%
2016 375
 
1.9%
2017 2607
13.1%
2018 3733
18.7%
ValueCountFrequency (%)
2018 3733
18.7%
2017 2607
13.1%
2016 375
 
1.9%
2015 288
 
1.4%
2014 264
 
1.3%
2013 138
 
0.7%
2012 255
 
1.3%
2011 410
 
2.1%

weekday_transaction
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11856
Missing (%)59.5%
Memory size155.8 KiB
True
7770 
False
 
300
(Missing)
11856 
ValueCountFrequency (%)
True 7770
39.0%
False 300
 
1.5%
(Missing) 11856
59.5%
2023-08-17T14:40:48.603912image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

transaction_to_original_diff
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11856
Missing (%)59.5%
Memory size155.8 KiB
False
7892 
True
 
178
(Missing)
11856 
ValueCountFrequency (%)
False 7892
39.6%
True 178
 
0.9%
(Missing) 11856
59.5%
2023-08-17T14:40:48.751869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

currency_change
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11856
Missing (%)59.5%
Memory size155.8 KiB
False
7892 
True
 
178
(Missing)
11856 
ValueCountFrequency (%)
False 7892
39.6%
True 178
 
0.9%
(Missing) 11856
59.5%
2023-08-17T14:40:48.906504image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

transaction_freq_gt_weekly
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing11856
Missing (%)59.5%
Memory size155.8 KiB
False
8069 
F0,"Game, Toy, and Hobby Shops",CAD,P04663,2012-08-15,205.52,CAD,RECREATIONAL & EDUCATIONAL SUPPLIES,205.52,"PARKS, FORESTRY & RECREATION ",2600,2125-115,True,2,201208,2012,True,False,False,False
 
1

Length

Max length195
Median length5
Mean length5.023544
Min length5

Characters and Unicode

Total characters40540
Distinct characters46
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFalse
2nd rowFalse
3rd rowFalse
4th rowFalse
5th rowFalse

Common Values

ValueCountFrequency (%)
False 8069
40.5%
F0,"Game, Toy, and Hobby Shops",CAD,P04663,2012-08-15,205.52,CAD,RECREATIONAL & EDUCATIONAL SUPPLIES,205.52,"PARKS, FORESTRY & RECREATION ",2600,2125-115,True,2,201208,2012,True,False,False,False 1
 
< 0.1%
(Missing) 11856
59.5%

Length

2023-08-17T14:40:49.049475image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-17T14:40:49.248630image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
false 8069
99.9%
2
 
< 0.1%
f0,"game 1
 
< 0.1%
toy 1
 
< 0.1%
and 1
 
< 0.1%
hobby 1
 
< 0.1%
shops",cad,p04663,2012-08-15,205.52,cad,recreational 1
 
< 0.1%
educational 1
 
< 0.1%
supplies,205.52,"parks 1
 
< 0.1%
forestry 1
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 8075
19.9%
F 8074
19.9%
a 8074
19.9%
s 8073
19.9%
l 8072
19.9%
, 22
 
0.1%
2 14
 
< 0.1%
11
 
< 0.1%
0 11
 
< 0.1%
A 8
 
< 0.1%
Other values (36) 106
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32310
79.7%
Uppercase Letter 8140
 
20.1%
Decimal Number 46
 
0.1%
Other Punctuation 30
 
0.1%
Space Separator 11
 
< 0.1%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 8074
99.2%
A 8
 
0.1%
T 7
 
0.1%
E 7
 
0.1%
R 7
 
0.1%
C 5
 
0.1%
S 5
 
0.1%
O 4
 
< 0.1%
I 4
 
< 0.1%
P 4
 
< 0.1%
Other values (8) 15
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
e 8075
25.0%
a 8074
25.0%
s 8073
25.0%
l 8072
25.0%
o 3
 
< 0.1%
y 2
 
< 0.1%
r 2
 
< 0.1%
u 2
 
< 0.1%
b 2
 
< 0.1%
d 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 14
30.4%
0 11
23.9%
1 7
15.2%
5 7
15.2%
6 3
 
6.5%
8 2
 
4.3%
3 1
 
2.2%
4 1
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 22
73.3%
" 4
 
13.3%
. 2
 
6.7%
& 2
 
6.7%
Space Separator
ValueCountFrequency (%)
11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 40450
99.8%
Common 90
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 8075
20.0%
F 8074
20.0%
a 8074
20.0%
s 8073
20.0%
l 8072
20.0%
A 8
 
< 0.1%
T 7
 
< 0.1%
E 7
 
< 0.1%
R 7
 
< 0.1%
C 5
 
< 0.1%
Other values (22) 48
 
0.1%
Common
ValueCountFrequency (%)
, 22
24.4%
2 14
15.6%
11
12.2%
0 11
12.2%
1 7
 
7.8%
5 7
 
7.8%
" 4
 
4.4%
- 3
 
3.3%
6 3
 
3.3%
. 2
 
2.2%
Other values (4) 6
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 8075
19.9%
F 8074
19.9%
a 8074
19.9%
s 8073
19.9%
l 8072
19.9%
, 22
 
0.1%
2 14
 
< 0.1%
11
 
< 0.1%
0 11
 
< 0.1%
A 8
 
< 0.1%
Other values (36) 106
 
0.3%

Interactions

2023-08-17T14:40:27.257082image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:25.729886image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:26.684679image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:27.447817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:26.033555image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:26.881268image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:27.643564image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:26.316878image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-17T14:40:27.060333image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-08-17T14:40:49.506104image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
original_amounttransaction_monthtransaction_yearoriginal_currency1trx_currencydivisiontransaction_gt_50transaction_day_of_weekweekday_transactiontransaction_to_original_diffcurrency_changetransaction_freq_gt_weekly
original_amount1.0000.0410.0351.0000.5001.0000.0001.0001.0001.0001.0001.000
transaction_month0.0411.0000.9320.0001.0000.4390.0310.0490.1080.0250.0250.000
transaction_year0.0350.9321.0000.0001.0000.4390.0310.0490.1080.0250.0250.000
original_currency11.0000.0000.0001.0001.0000.5010.0450.5000.0221.0001.0000.000
trx_currency0.5001.0001.0001.0001.0001.0000.0001.0001.0001.0001.0001.000
division1.0000.4390.4390.5011.0001.0000.0130.4500.1080.0800.0800.000
transaction_gt_500.0000.0310.0310.0450.0000.0131.0000.0460.0400.0450.0450.000
transaction_day_of_week1.0000.0490.0490.5001.0000.4500.0461.0001.0000.0440.0440.000
weekday_transaction1.0000.1080.1080.0221.0000.1080.0401.0001.0000.0240.0240.000
transaction_to_original_diff1.0000.0250.0251.0001.0000.0800.0450.0440.0241.0000.9970.000
currency_change1.0000.0250.0251.0001.0000.0800.0450.0440.0240.9971.0000.000
transaction_freq_gt_weekly1.0000.0000.0000.0001.0000.0000.0000.0000.0000.0000.0001.000

Missing values

2023-08-17T14:40:27.997762image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-08-17T14:40:28.985605image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-08-17T14:40:29.720919image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0purposemerchant_namecost_center_wbls_element_order_descriptioncard_posting_datemerchant_type_mccmerchant_type_descriptionoriginal_currency1cost_center_wbls_element_ordertransaction_datetransaction_amounttrx_currencygl_account_descriptionoriginal_amountdivisiongl_accountbatch_transaction_idtransaction_gt_50transaction_day_of_weektransaction_monthtransaction_yearweekday_transactiontransaction_to_original_diffcurrency_changetransaction_freq_gt_weekly
01003Book: Canadian Urban Regionsindigo books musicHEAD OFF-POLICY&RESR2011-06-275399.0Miscellaneous General MerchandiseCADUR00052011-06-2660.85CADBOOK & MAGAZINE SUBSCRIPTIONS60.85URBAN PLANNING2020.01521-44False6201106.02011.0FalseFalseFalseFalse
11002update#4 to looseleaf publication:Planning&ZoningcarswellHEAD OFF-POLICY&RESR2011-06-087338.0Quick Copy, Reproduction, and BlueprintiCADUR00052011-06-07273.45CADBOOK & MAGAZINE SUBSCRIPTIONS273.45URBAN PLANNING2020.01494-34True1201106.02011.0TrueFalseFalseFalse
2895LooseLeaf Publications updatescarswellHEAD OFF-POLICY&RESR2011-05-247338.0Quick Copy, Reproduction, and BlueprintiCADUR00052011-05-20281.85CADBOOK & MAGAZINE SUBSCRIPTIONS281.85URBAN PLANNING2020.01471-36True4201105.02011.0TrueFalseFalseFalse
3896update to looseleaf publication inv#10288391rei lexisnexis canadaHEAD OFF-POLICY&RESR2011-05-245942.0Book StoresCADUR00052011-05-2091.3CADBOOK & MAGAZINE SUBSCRIPTIONS91.30URBAN PLANNING2020.01471-37False4201105.02011.0TrueFalseFalseFalse
4894Book: Rapid Graphs with Tableau Software 6createspaceHEAD OFF-POLICY&RESR2011-05-167829.0Motion Picture and Video Tape ProductionUSDUR00052011-05-1445.82CADBOOK & MAGAZINE SUBSCRIPTIONS45.99URBAN PLANNING2020.01461-33False5201105.02011.0FalseTrueTrueFalse
5857update to looseleaf publication inv#7304047carswellHEAD OFF-POLICY&RESR2011-04-157338.0Quick Copy, Reproduction, and BlueprintiCADUR00052011-04-14291.3CADBOOK & MAGAZINE SUBSCRIPTIONS291.30URBAN PLANNING2020.01417-27True3201104.02011.0TrueFalseFalseFalse
6856Book: Everyday Ethnics for Practicing Plannersapa bookstoreHEAD OFF-POLICY&RESR2011-04-088299.0Educational ServicesUSDUR00052011-04-0763.11CADBOOK & MAGAZINE SUBSCRIPTIONS63.95URBAN PLANNING2020.01406-50False3201104.02011.0TrueTrueTrueFalse
7854Book order #73422064abebookscomHEAD OFF-POLICY&RESR2011-04-075192.0Books, Periodicals, and NewspapersUSDUR00052011-04-0645.07CADBOOK & MAGAZINE SUBSCRIPTIONS46.91URBAN PLANNING2020.01404-41False2201104.02011.0TrueTrueTrueFalse
81030Book:Inclusionary Housing In inter.Perspectivelincoln inst land plcyHEAD OFF-POLICY&RESR2011-03-318299.0Educational ServicesUSDUR00052011-03-3042.74CADBOOK & MAGAZINE SUBSCRIPTIONS42.85URBAN PLANNING20201392-45False2201103.02011.0TrueTrueTrueFalse
91029Book: Conference Proceedingspaypal makingcitieHEAD OFF-POLICY&RESR2011-03-118999.0Professional Services - Not Elsewhere ClCADUR00052011-03-0969.62CADCONFERENCES/SEMINARS - REGISTRATION FEES69.62URBAN PLANNING42561365-34False2201103.02011.0TrueFalseFalseFalse
Unnamed: 0purposemerchant_namecost_center_wbls_element_order_descriptioncard_posting_datemerchant_type_mccmerchant_type_descriptionoriginal_currency1cost_center_wbls_element_ordertransaction_datetransaction_amounttrx_currencygl_account_descriptionoriginal_amountdivisiongl_accountbatch_transaction_idtransaction_gt_50transaction_day_of_weektransaction_monthtransaction_yearweekday_transactiontransaction_to_original_diffcurrency_changetransaction_freq_gt_weekly
19916331,*****,home depot 7013,SCARLETT WOODS-OPERA,2012-07-09,5200.0,Home Supply Warehouse,CAD,P07648,2012-07-06,21.38,CAD,GENERAL HARDWARE,21.38,"PARKS, FORESTRY & RECREATION ",2710,2066-19,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19917332,DOG FOOD FOR DOG COURSE DOG,cdn tire store 00182,SCARLETT WOODS-OPERA,2012-07-10,5200.0,Home Supply Warehouse,CAD,P07648,2012-07-06,81.33,CAD,ANIMAL CARE SUPPLIES,81.33,"PARKS, FORESTRY & RECREATION ",2620,2069-12,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19918359,ADMISSION FEES -SUMMER CAMP ACTIVITY,montgomerys inn,BLOORDALE CS-SUMR CA,2012-07-09,9399.0,Government Services - Not Elsewhere Clas,CAD,P05092,2012-07-06,103.44,CAD,TICKETS AND ADMISSION FEES,103.44,"PARKS, FORESTRY & RECREATION ",4118,2066-20,True,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19919360,ADMISSION FEES -SUMMER CAMP ACTIVITY,wild water kingdom,BLOORDALE CS-SUMR CA,2012-07-10,7996.0,"Amusement Parks, Carnivals, Circuses, Fo",CAD,P05092,2012-07-06,103.0,CAD,TICKETS AND ADMISSION FEES,103.0,"PARKS, FORESTRY & RECREATION ",4118,2069-13,True,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19920361,ADMISSION FEES -SUMMER CAMP ACTIVITY,wild water kingdom,BLOORDALE CS-SUMR CA,2012-07-10,7996.0,"Amusement Parks, Carnivals, Circuses, Fo",CAD,P05092,2012-07-06,1052.2,CAD,TICKETS AND ADMISSION FEES,1052.2,"PARKS, FORESTRY & RECREATION ",4118,2069-14,True,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19921503,,alpine lawn garden e,TECH SERV. 1-FLEET E,2012-07-09,5261.0,Lawn and Garden Supply Stores,CAD,P00830,2012-07-06,30.5,CAD,MISCELLANEOUS PARTS,30.5,"PARKS, FORESTRY & RECREATION ",2199,2066-26,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19922613,SUPPLIES FOR COOKING,walmart 3159,SUMMRCMP-STPHENLCKRC,2012-07-09,5411.0,"Grocery Stores, Supermarkets",CAD,P13279,2012-07-06,178.58,CAD,FOOD & NON-ALCOHOLIC BEVERAGES,178.58,"PARKS, FORESTRY & RECREATION ",2750,2066-30,True,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19923614,FOOD FOR SPECIAL EVENT,walmart 3159,SUMMRCMP-STPHENLCKRC,2012-07-09,5411.0,"Grocery Stores, Supermarkets",CAD,P13279,2012-07-06,48.2,CAD,FOOD & NON-ALCOHOLIC BEVERAGES,48.2,"PARKS, FORESTRY & RECREATION ",2750,2066-31,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19924615,CAMP SUPPLIES,walmart 3159,SUMMRCMP-STPHENLCKRC,2012-07-09,5411.0,"Grocery Stores, Supermarkets",CAD,P13279,2012-07-06,13.96,CAD,RECREATIONAL & EDUCATIONAL SUPPLIES,13.96,"PARKS, FORESTRY & RECREATION ",2600,2066-32,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
19925616,TOYS & PRIZES FOR CAMP PARTICIPANTS,dollar store,SUMMRCMP-STPHENLCKRC,2012-07-10,5999.0,Miscellaneous and Specialty Retail Store,CAD,P13279,2012-07-06,9.61,CAD,RECREATIONAL & EDUCATIONAL SUPPLIES,9.61,"PARKS, FORESTRY & RECREATION ",2600,2069-20,False,4,201207,2012,True,False,False,FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN